{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Q Learning (Off Policy) with experience replay\n",
    "\n",
    "We will be using **TD control method of Q Learning** on Cliff World environment as given below:  \n",
    "\n",
    "![GridWorld](./images/cliffworld.png \"Cliff World\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q Learning Update equation (remains same)\n",
    "\n",
    "Q Learning control is carried by sampling step by step and updating Q values at each step. We use ε-greedy policy to explore and generate samples. However, the policy learnt is a deterministic greedy policy with no exploration. The Update equation is given below:\n",
    "\n",
    "$$ \n",
    "\\DeclareMathOperator*{\\max}{max} Q(S,A) \\leftarrow Q(S,A) + \\alpha * [ R + \\gamma * \\max_{A'} Q(S’,A’) – Q(S,A)] $$\n",
    "\n",
    "\n",
    "In experience replay, we store the samples `(s, a, r, s', done)` in a buffer. The samples are generated using an exploratory behavior policy while we improve a deterministic target policy using Q-values. Therefore, we can always use older samples from behavior policy and apply them again and again. We can keep the buffer size fixed to some pre-determined size and keep deleting the older samples as we collect new ones. The process makes learning efficient by reusing a sample multiple times. Rest of the approach remains same. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial imports and enviroment setup\n",
    "import sys\n",
    "import gym\n",
    "import gym.envs.toy_text\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Q Learning Learning agent class\n",
    "from collections import defaultdict\n",
    "\n",
    "\n",
    "class QLearningAgent:\n",
    "    def __init__(self, alpha, epsilon, gamma, get_possible_actions):\n",
    "        self.get_possible_actions = get_possible_actions\n",
    "        self.alpha = alpha\n",
    "        self.epsilon = epsilon\n",
    "        self.gamma = gamma\n",
    "        self._Q = defaultdict(lambda: defaultdict(lambda: 0))\n",
    "\n",
    "    def get_Q(self, state, action):\n",
    "        return self._Q[state][action]\n",
    "\n",
    "    def set_Q(self, state, action, value):\n",
    "        self._Q[state][action] = value\n",
    "\n",
    "    # Q learning update step\n",
    "    def update(self, state, action, reward, next_state, done):\n",
    "        if not done:\n",
    "            best_next_action = self.max_action(next_state)\n",
    "            td_error = reward + self.gamma * \\\n",
    "                self.get_Q(next_state, best_next_action) - \\\n",
    "                self.get_Q(state, action)\n",
    "        else:\n",
    "            td_error = reward - self.get_Q(state, action)\n",
    "\n",
    "        new_value = self.get_Q(state, action) + self.alpha * td_error\n",
    "        self.set_Q(state, action, new_value)\n",
    "\n",
    "    # get best A for Q(S,A) which maximizes the Q(S,a) for actions in state S\n",
    "    def max_action(self, state):\n",
    "        actions = self.get_possible_actions(state)\n",
    "        best_action = []\n",
    "        best_q_value = float(\"-inf\")\n",
    "\n",
    "        for action in actions:\n",
    "            q_s_a = self.get_Q(state, action)\n",
    "            if q_s_a > best_q_value:\n",
    "                best_action = [action]\n",
    "                best_q_value = q_s_a\n",
    "            elif q_s_a == best_q_value:\n",
    "                best_action.append(action)\n",
    "        return np.random.choice(np.array(best_action))\n",
    "\n",
    "    # choose action as per epsilon-greedy policy for exploration\n",
    "    def get_action(self, state):\n",
    "        actions = self.get_possible_actions(state)\n",
    "\n",
    "        if len(actions) == 0:\n",
    "            return None\n",
    "\n",
    "        if np.random.random() < self.epsilon:\n",
    "            a = np.random.choice(actions)\n",
    "            return a\n",
    "        else:\n",
    "            a = self.max_action(state)\n",
    "            return a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot rewards\n",
    "def plot_rewards(env_name, rewards, label):\n",
    "    plt.title(\"env={}, Mean reward = {:.1f}\".format(env_name,\n",
    "                                                    np.mean(rewards[-20:])))\n",
    "    plt.plot(rewards, label=label)\n",
    "    plt.grid()\n",
    "    plt.legend()\n",
    "    plt.ylim(-300, 0)\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# print policy Cliff world\n",
    "def print_policy(env, agent):\n",
    "    nR, nC = env._cliff.shape\n",
    "\n",
    "    actions = '^>v<'\n",
    "\n",
    "    for y in range(nR):\n",
    "        for x in range(nC):\n",
    "            if env._cliff[y, x]:\n",
    "                print(\" C \", end='')\n",
    "            elif (y * nC + x) == env.start_state_index:\n",
    "                print(\" X \", end='')\n",
    "            elif (y * nC + x) == nR * nC - 1:\n",
    "                print(\" T \", end='')\n",
    "            else:\n",
    "                print(\" %s \" %\n",
    "                      actions[agent.max_action(y * nC + x)], end='')\n",
    "        print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Replay Buffer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ReplayBuffer:\n",
    "    def __init__(self, size):\n",
    "        self.size = size  # max number of items in buffer\n",
    "        self.buffer = []   # array to hold buffer\n",
    "\n",
    "    def __len__(self):\n",
    "        return len(self.buffer)\n",
    "\n",
    "    def add(self, state, action, reward, next_state, done):\n",
    "        item = (state, action, reward, next_state, done)\n",
    "        self.buffer = self.buffer[-self.size:] + [item]\n",
    "\n",
    "    def sample(self, batch_size):\n",
    "        idxs = np.random.choice(len(self.buffer), batch_size)\n",
    "        samples = [self.buffer[i] for i in idxs]\n",
    "        states, actions, rewards, next_states, done_flags = list(zip(*samples))\n",
    "        return states, actions, rewards, next_states, done_flags"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# training algorithm with reply buffer\n",
    "def train_agent(env, agent, episode_cnt=10000, tmax=10000,\n",
    "                anneal_eps=True, replay_buffer=None, batch_size=16):\n",
    "    episode_rewards = []\n",
    "    for i in range(episode_cnt):\n",
    "        G = 0\n",
    "        state = env.reset()\n",
    "        for t in range(tmax):\n",
    "            action = agent.get_action(state)\n",
    "            next_state, reward, done, _ = env.step(action)\n",
    "            if replay_buffer:\n",
    "                replay_buffer.add(state, action, reward, next_state, done)\n",
    "                states, actions, rewards, next_states, done_flags = \\\n",
    "                        replay_buffer(batch_size)\n",
    "                for i in range(batch_size):\n",
    "                    agent.update(states[i], actions[i], rewards[i],\n",
    "                                 next_states[i], done_flags[i])\n",
    "            else:\n",
    "                agent.update(state, action, reward, next_state, done)\n",
    "\n",
    "            G += reward\n",
    "            if done:\n",
    "                episode_rewards.append(G)\n",
    "                # to reduce the exploration probability epsilon over the\n",
    "                # training period.\n",
    "                if anneal_eps:\n",
    "                    agent.epsilon = agent.epsilon * 0.99\n",
    "                break\n",
    "            state = next_state\n",
    "    return np.array(episode_rewards)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "    This is a simple implementation of the Gridworld Cliff\n",
      "    reinforcement learning task.\n",
      "\n",
      "    Adapted from Example 6.6 (page 106) from Reinforcement Learning: An Introduction\n",
      "    by Sutton and Barto:\n",
      "    http://incompleteideas.net/book/bookdraft2018jan1.pdf\n",
      "\n",
      "    With inspiration from:\n",
      "    https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py\n",
      "\n",
      "    The board is a 4x12 matrix, with (using Numpy matrix indexing):\n",
      "        [3, 0] as the start at bottom-left\n",
      "        [3, 11] as the goal at bottom-right\n",
      "        [3, 1..10] as the cliff at bottom-center\n",
      "\n",
      "    Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward\n",
      "    and a reset to the start. An episode terminates when the agent reaches the goal.\n",
      "    \n"
     ]
    }
   ],
   "source": [
    "# create cliff world environment\n",
    "env = gym.envs.toy_text.CliffWalkingEnv()\n",
    "print(env.__doc__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a Q Learning agent\n",
    "agent = QLearningAgent(alpha=0.25, epsilon=0.2, gamma=0.99, \n",
    "                       get_possible_actions=lambda s : range(env.nA))\n",
    "\n",
    "# train agent using replay buffer and get rewards for episodes\n",
    "rewards = train_agent(env, agent, episode_cnt=5000, \n",
    "                      replay_buffer=ReplayBuffer(512))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEICAYAAAC3Y/QeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAmZklEQVR4nO3deZwU5b3v8c+PEZkJq4GwOSiguIABDKMiaBwjUSMGUDHizUEIMQiBa67mdY3KcUskicaol2PUixwlGHLc0GBEOUpCoyguqICCgqCoIyMohn0d5nf+qJqxGbpn6+nppuv7fr3qRdfzVD/1/IqeX1c9tbS5OyIiEi1NMt0BERFpfEr+IiIRpOQvIhJBSv4iIhGk5C8iEkFK/iIiEaTkL9Uys9FmtjBufpuZdQ9fF5jZ381ss5k9HpbdamZfmtnnGejrWjMblKSu2MxKGrtPuc7MupqZm9khme6L1I2Sv2Bm55jZi2a21cy+MLMFZjYk0bLu3sLdPwxnhwMdgLbufrGZdQF+CfR0944J1rPSzH4UNz8wTBxVy7ZlMpmEXxRuZk9WKe8Tlscy1LXIMrMzzWx+uKOxNkH9/PCzu8XMlprZ0GraMjO7zcw2htPtZmZpDSALKflHnJkNBx4HZgCFBMn8RuCHtXj7kcAqdy+Lm9/o7huSLP8icEbc/HeB9xOUvRLXZm1iSMcXxRfAADNrG1c2CliVhnXVmZnlZWCdmdy73w48CPzfJPW/ADq5eytgLPAXM+uUZNmxwDCgD9AbOB+4okF7exBQ8s8wM+tsZrPCvZaPzOzKuLqbzewxM5sR7pUvN7OisO5aM3uiSlv/z8ym1GHdBtwJ/Mbdp7n7Zncvd/cF7v6zJO9xMzvazG4h+JK4JNxTvwJ4Aegczk9P8PYXCZJ7hdOB2xKUvRiua0gY8yYzi5nZ8XH9WGtmvzKzZcD2qokpHJKabmb/MrMVwEm13S6hPcDfgBFhe3nAj4CZVdZznJm9YGZfJTiyGWxmb4d7o5+a2c1xdRXDJaPM7JNwqGxSss6EsdxnZs+a2XbgzGSfHTPLN7OdZtYunP93Myszs1bh/K1mdncd+vhTM/sE+KeZ5ZnZHWF/PwQG13G71ou7v+7uDwMfJqlfFrfD4EBToEuS5kYBf3T3Enf/DPgjMLqBu5z93F1ThiaCL983CZLooUB3gg/3OWH9zcAu4DwgD/gd8GpYdySwA2gVzucBpUD/cP5eYFOSaVm4zHEEfyjdqunjaGBh3LwDR8f17y9xdcVASTVtHQGUA98MY98AFACfxpVtIvgyOIZgb+/7BH/I1wCrgUPDttYCSwj+wAviygaFr38PvBS22wV4t7q+VelnMVACDABeC8vOA/4buByIhWXNw77/BDgE+A7wJdArrp1vh3H1BtYDw8K6ruG2fCDcBn2A3cDxSfo0HdgMDAzb+wbVf3ZeBC4KXz8PrAF+EFd3QR36OCOMtQAYR3C01iXctvPDZQ5J0u9nSP45fKYefzODgLXVrGtX2J+5QJMky20GTombLwK2ZjofNPakPf/MOgn4lrv/2t33eDCW/gDh3mZoobs/6+77gIcJkgTu/jHwFsHhK8D3gB3u/mpY/3N3b5Nk6h2+p2JIozStUYbc/RPgE4K9+z7AB+6+E3g5riwfeA24BJjj7i+4+17gDoLkMyCuySnu/mnYRlU/Aia7+1fu/ilQ6yOiuP6+AnzTzI4FLiNIgvHOJ0hED7l7mbu/BcwiOBeCu8fc/R0PjqaWAf/F/kNcALe4+053XwosDbdBMrPd/WV3LydI2NV9dhYAZ4RHRL3D+M8ws3yCz91Ldejjze6+PdzOPwLuDrf7VwQ7JNVtw/Or+RyeX9176ypsryXhF3W4nRJpQfAFUGEz0CJq4/5K/pl1JMEwyaaKCbieYNy9QvxVMzuA/Lghjr8Cl4av/1c4Xxcbw3+TjY2mQ8XQz3cJExCwMK7sNXffDXQGPq54U/iH/ClweFxbn1azns5V6j9OtmANHgYmAmcCT1WpOxI4pcr/34+BjgBmdkrcicjNBHvN7aq0UfX/t0U1fYmPp6bPzgKCvfrvAO8QDMmdAfQHVrv7l3XoY/x6G2q7JmVm14dDh9vM7P66vNfd97r7c8A5luSiBWAb0CpuvhWwzcPDgKhQ8s+sT4GPquwNtXT382r5/seBYjMrBC4gLvmb2f1xf0BVp+XhYivDPlzUkEHVoCL5n87Xyf+luLIXw7J1BAkOqDw/0QX4LK6t6v5YS9l/zPeIevb3YeDnwLPuvqNK3afAgir/fy3cfXxY/1fgaaCLu7cG7gdS2buMj7emz84rwLEEn4sF7r6CYBsMJvhiqFCbPsavt07b1cyeq+Zz+FzCIN1/G27HFu4+rrr2q3EIcFSSuuXsf4TVJyyLFCX/zHod2BKeuCwIT6adYGa1Ojnp7l8AMeAhgkTwXlzduLg/oKpTr3AZB64GbjCzn5hZKzNrYmanmdnUBo828CJwIsFe6Mth2TtAN4K964rk/xgw2MzOMrOmBJeQ7iZIarXxGHCdmR0Wfjn+7/jK8ATq9JoacfePwr4mOhn7DHCMmY00s6bhdFLciemWwFfuvsvMTiY4Omso1X52wi+qN4EJfJ3sXyG4qiU++de1j48BV5pZoZkdBlxb3cLu/oNqPoc/qG2w4ecyn+D8j4UntQ8N644zsx+E26Gpmf0bwc7EgiTNzQCuNrPDzawzwWdrem37kiuU/DMoHMf/IdAX+IjgZOE0oHUdmvkrwUmwug75VPThCYLx9TEEe9vrgVuB2fVprxbrW0VworfU3TeFZeUEyawVYXJ395XAvwH/QbBdfgj80N331HJVtxAMSXxEcMLz4Sr1Xfj6y6emPi9093UJyrcCZxOMs68jGMK5DWgWLvJz4NdmtpXgxOxjtex7bfpUm8/OAoJk+XrcfEu+/oKtTx8fIDjxvZTgnNOT1S/eYL4L7ASeJTja2Enw/wrBkcrNBJ+rLwgu+7wkPAeDmZ1uZtvi2vr/wN8JdjreBeaEZZFiERvmEiHcY1wK9A5PJotEjpK/iEgEZWzYx8zOteCmmNVmVu24oYiINKyM7PlbcLfkKoIbeEqAN4BLwysSREQkzTK1538ywbXGH4Yn8B4Bkj6ISUREGlamHtR0OPvfKFICnFJ1ITMbS/AQJgoKCvp16ZLsUR3VKy8vp0mTaF3YpJijIWoxRy1eSD3mVatWfenu36panqnkn+hGlwPGn9x9KjAVoKioyBcvXlyvlcViMYqLi+v13oOVYo6GqMUctXgh9ZjNLOFd2Jn6Ci1h/7sECwmukxYRkUaQqeT/BtDDzLqF11yPILjFXEREGkFGhn3cvczMJhLcKZgHPOjukXu2hohIpmTsl3nc/VmCW7VFRKSRReu0uYiIAEr+IiKRpOQvIhJBGRvzz6QJf32LOcvS88uFhx4SfJ/uKStPWF6TivfVdvlkysvLaTIv4W9l5CzFnPuiFi8EMb87cB/5TfMatN1IJf95K9ZzfOdWaUv8hYcVcH7vzmzdtZeZr32yX92Ygd1q1cb9C9bUaflkPvnkE444or4/XnVwUsy5L2rxQhBzXpOG/3nhSCX/y2csps03mqbczvu/OZfjbphbOX/D+T35zTMreGj0SfTo0JKSf+1g5muf0DL/EK74bncefvVjrv3BcbVqu2xfOS998GWtl08mFvuc4uLU2jjYKObcF7V4IYi5aV7Dj9BHKvkDbNqR+m935DfNY8aYk7nswde58ntH89PTuvHT0xLvqU/8Xg8mfq9Hrdv+9/N7ptw/EZGaROKE79Y9zr2x1Sm18cBlRfvNn96jHdMuK+LKs2qf2EVEskUk9vz/853dLPliZUptfL9nh/3mzYxBVcpERA4Wkdjz31mmn6oUEYkXieTf2KnfLDgz3/Dn50VEGkYkkr+IiOwvEsk/Az9TLCKS1aKR/DPdARGRLBON5K/sLyKyn0gk//KaF2lQFSd6K078iohkm5xP/m9+/C8+2tzY6V9EJLvlfPK/L7Ym010QEck6OZ/8dbpXRORAOZ/8G+JBbiIiuSbnk//ajTsafZ0V53l1vldEslXOJ38RETlQBJK/xvxFRKrK+eSvG7xERA6U88k/E0zP8xSRLJfzyT+TO/76ChCRbJX7yV/jPiIiB0hb8jezm83sMzNbEk7nxdVdZ2arzWylmZ2Trj6IiEhi6f4N37vc/Y74AjPrCYwAegGdgXlmdoy770tzX0REJJSJYZ+hwCPuvtvdPwJWAyena2WZGPTRzV0iku3SnfwnmtkyM3vQzA4Lyw4HPo1bpiQsS4tMDvnrkc4ikq1SGvYxs3lAxwRVk4D7gN8Q7Hz/BvgjMIbEF8EkTNFmNhYYC9ChQwdisVid+1hW1jDP9olfd039+Neu4BHSe/fsqVefG8K2bdsytu5MUcy5L2rxQvpiTin5u/ug2ixnZg8Az4SzJUCXuOpCYF2S9qcCUwGKioq8uLi4zn08ZMHzsDf1L4Di4mKYO+fr19VYv2UXxP5B00MPrXHZdInFYhlbd6Yo5twXtXghfTGn82qfTnGzFwDvhq+fBkaYWTMz6wb0AF5PVz90qaeIyIHSebXP7WbWl2BIZy1wBYC7Lzezx4AVQBkwIdeu9NFIv4hku7Qlf3cfWU3dZGByutYdb8uussZYTUL6EhCRbJXzd/iKiMiBlPxFRCJIyV9EJIKU/NNBg/0ikuWU/NNIN/iKSLZS8hcRiSAlfxGRCFLyFxGJICV/EZEIUvJPK53xFZHspOSfDnqWnIhkOSV/EZEIUvIXEYkgJX8RkQhS8k8j3eErItlKyT8NdL5XRLKdkr+ISAQp+aeBRntEJNsp+aeBhn1EJNsp+aeRjgBEJFsp+aeBa9dfRLKckr+ISAQp+aeBru8XkWyn5J8GGvYRkWyn5J9GOgIQkWyl5J8Gros9RSTLKfmLiESQkn8amK7wF5Esl1LyN7OLzWy5mZWbWVGVuuvMbLWZrTSzc+LK+5nZO2HdFLPcGxnXsI+IZLtU9/zfBS4EXowvNLOewAigF3AucK+Z5YXV9wFjgR7hdG6KfchaOgIQkWyVUvJ39/fcfWWCqqHAI+6+290/AlYDJ5tZJ6CVuy9ydwdmAMNS6UM20qWeIpLtDklTu4cDr8bNl4Rle8PXVcsTMrOxBEcJdOjQgVgs1uAdra34ddfUj407ywHYvXt3xvq8bdu2jG6vTFDMuS9q8UL6Yq4x+ZvZPKBjgqpJ7j472dsSlHk15Qm5+1RgKkBRUZEXFxdX39lE5s6p+3sSKC4urmyrpn6Ubt4JC/5Js2bNalw2XWKxWMbWnSmKOfdFLV5IX8w1Jn93H1SPdkuALnHzhcC6sLwwQXlO0bCPiGS7dF3q+TQwwsyamVk3ghO7r7t7KbDVzPqHV/lcBiQ7ejjo5d51TCKSK1K91PMCMysBTgXmmNl/A7j7cuAxYAUwF5jg7vvCt40HphGcBF4DPJdKH7KRdvxFJNuldMLX3Z8CnkpSNxmYnKB8MXBCKusVEZHU6A7fNNBoj4hkOyX/NNCwj4hkOyX/NNIRgIhkKyV/EZEIUvIXEYkgJX8RkQhS8k8D1y2+IpLl0vVgt5zx0OiT6n2nbg7+VIGI5Agl/xqceVz7THdBRKTBKfmnQafWBZzdswPjio/KdFdERBJS8g/16dKG5Z9tpqw8GK8/qeth/Mel36lXW3lNjKmXFdW8oIhIhuiEb5y3bvx+5evHxw2gY+v8/eovKerCMR1aNHa3REQanPb847TKb8rsCQN5ec2XCetvG967kXskIpIeSv6hiuty+nRpQ58ubTLZFRGRtNOwj4hIBCn5i4hEkJK/iEgEKfmLiESQkr+ISAQp+YuIRJCSv4hIBCn5i4hEkJK/iEgEKfmLiESQkr+ISAQp+YuIRJCSv4hIBKWU/M3sYjNbbmblZlYUV97VzHaa2ZJwuj+urp+ZvWNmq81siumHbkVEGl2qe/7vAhcCLyaoW+PufcNpXFz5fcBYoEc4nZtiHxqEvoJEJEpSSv7u/p67r6zt8mbWCWjl7ovc3YEZwLBU+iAiInWXzh9z6WZmbwNbgH9395eAw4GSuGVKwrKEzGwswVECHTp0IBaLpa2zWzZvSWv7jW3btm05FU9tKObcF7V4IX0x15j8zWwe0DFB1SR3n53kbaXAEe6+0cz6AX8zs158/YNZ8TzZut19KjAVoKioyIuLi2vq7oHmzqnVYq1at6K4eGDd289SsViMem2vg5hizn1RixfSF3ONyd/dB9W1UXffDewOX79pZmuAYwj29AvjFi0E1tW1/YbQuXU+6zbvysSqRUQyLi2XeprZt8wsL3zdneDE7ofuXgpsNbP+4VU+lwHJjh5ERCRNUr3U8wIzKwFOBeaY2X+HVd8FlpnZUuAJYJy7fxXWjQemAauBNcBzqfRBRETqLqUTvu7+FPBUgvJZwKwk71kMnJDKekVEJDWRvcNX95aJSJRFNvmLiESZkr+ISARFLvm3a9Es010QEcm4yCX/2y76NgDB0yVERKIpcsm/aV4Qsk74ikiURS75a39fRCSCyT8ZHQeISJQo+YuIRJCSv4hIBEUu+esqHxGRCCb/CrrYR0SiLLLJX0QkyiKV/K84o3vla43+iEiURSr5H/HNb+jmLhERIpb8QSd8RUQggsm/gg4ARCTKIpv8RUSiTMlfRCSCIpX8DdOD3UREiFjy97jUrzF/EYmySCV/EREJKPmHdP2/iERJZJO/LvcXkSiLVPI3/WSLiAgQseQfT6M8IhJlkU3+IiJRllLyN7M/mNn7ZrbMzJ4yszZxddeZ2WozW2lm58SV9zOzd8K6KaYzrSIijS7VPf8XgBPcvTewCrgOwMx6AiOAXsC5wL1mlhe+5z5gLNAjnM5NsQ8iIlJHKSV/d3/e3cvC2VeBwvD1UOARd9/t7h8Bq4GTzawT0MrdF3nweM0ZwLBU+lD3Tjfq2kREstIhDdjWGODR8PXhBF8GFUrCsr3h66rlCZnZWIKjBDp06EAsFkupg6tWreSL/GCUadfOXfvVbd68OeX2s8m2bdtyKp7aUMy5L2rxQvpirjH5m9k8oGOCqknuPjtcZhJQBsyseFuC5b2a8oTcfSowFaCoqMiLi4tr6u6B5s6pfHnMMcfSqXU+vPkGBQUFsHNHZV3r1q0pLh5Q9/azVCwWo17b6yCmmHNf1OKF9MVcY/J390HV1ZvZKOB84Cz/+pdSSoAucYsVAuvC8sIE5Y3ONf4jIhGW6tU+5wK/Aoa4+464qqeBEWbWzMy6EZzYfd3dS4GtZtY/vMrnMmB2Kn0QEZG6S3XM/x6gGfBCeMXmq+4+zt2Xm9ljwAqC4aAJ7r4vfM94YDpQADwXTo1Od/uKSJSllPzd/ehq6iYDkxOULwZOSGW99aU7CkREArrDV0QkgpT8QzooEJEoiVzy11U+IiIRTP4VNP4vIlEWqeSvfC8iEohU8hcRkUCkkr8DbZs3A+Dob7XIbGdERDKoIR/sdlDo06UNf/3ZKbRt3ox/vL8h090REcmISO35VxhwVDsOydMZABGJrkglf6V7EZFApJK/iIgElPxFRCIossn/m984NNNdEBHJmMgm/8OaK/mLSHRFKvnrkQ4iIoFIJf/q6ItBRKJEyV9EJIIilfxdT3MWEQEilvxFRCSg5C8iEkGRSv46qSsiEohU8hcRkYCSv4hIBCn5i4hEkJK/iEgERSr5m57oLyICRCz5i4hIIKXkb2Z/MLP3zWyZmT1lZm3C8q5mttPMloTT/XHv6Wdm75jZajObYqYLMEVEGluqe/4vACe4e29gFXBdXN0ad+8bTuPiyu8DxgI9wuncFPtQa46e7yAiAikmf3d/3t3LwtlXgcLqljezTkArd1/k7g7MAIal0gcREam7hhzzHwM8FzffzczeNrMFZnZ6WHY4UBK3TElY1iiqO+Grk8EiEiWH1LSAmc0DOiaomuTus8NlJgFlwMywrhQ4wt03mlk/4G9m1gsSZtikYzFmNpZgiIgOHToQi8Vq6m613l/5PrHtaxLWbdq8KeX2s8m2bdtyKp7aUMy5L2rxQvpirjH5u/ug6urNbBRwPnBWOJSDu+8Gdoev3zSzNcAxBHv68UNDhcC6atY9FZgKUFRU5MXFxTV190Bz51S+PO7Y4yg+qUvCujat21BcfGrd289SsViMem2vg5hizn1RixfSF3OqV/ucC/wKGOLuO+LKv2VmeeHr7gQndj9091Jgq5n1D6/yuQyYnUofRESk7mrc86/BPUAz4IXwis1Xwyt7vgv82szKgH3AOHf/KnzPeGA6UEBwjuC5qo2KiEh6pZT83f3oJOWzgFlJ6hYDJ6Sy3nrTOV0REUB3+IqIRJKSv4hIBCn5i4hEkJK/iEgERSr5D+nTOdNdEBHJCpFK/vlN8zLdBRGRrBCp5C8iIgElfxGRCEr1Dl8RCe3du5eSkhJ27drVaOts3bo17733XqOtL9OiFi/UPub8/HwKCwtp2rRprdpV8q+gu38lRSUlJbRs2ZKuXbvSWD9Qt3XrVlq2bNko68oGUYsXahezu7Nx40ZKSkro1q1brdrVsI9IA9m1axdt27ZttMQvUsHMaNu2bZ2OOpX8RRqQEr9kSl0/e0r+IiIRpOQvkkNKSkoYOnQoPXr0oHv37kycOJHdu3cnXHb69OlMnDix0fr29NNP8/vf/z6lNkaOHMmHH34IQNeuXfn2t79N7969OeOMM/j444/r1ebatWs54YTUHzS8du1aCgoK6Nu3L3369GHAgAGsXLmysv7SSy+ld+/e3HXXXbz//vv07duXE088kTVrEv+6YCIjRozggw8+SLmvoOQvkjPcnQsvvJBhw4bxwQcf8MEHH7Bz506uueaaRuvDvn37ktYNGTKEa6+9tt5tL1++nH379tG9e/fKsvnz57Ns2TKKi4u59dZb6912QznqqKNYsmQJS5cuZdSoUfz2t78F4PPPP+eVV15h2bJlXHXVVfztb39j6NChvP322xx11FG1anvfvn2MHz+e22+/vUH6qqt9RNLglr8vZ8W6LQ3aZs/Orbjph72S1v/zn/8kPz+fn/zkJwDk5eVx1113ceSRRzJ58mRatGhRq/X85S9/YcqUKezZs4dTTjmFe++9l7y8PMaPH88bb7zBzp07GT58OLfccgsQ7IGPGTOG559/nokTJ3LttdcyatQo/v73v7N3714ef/xxjjvuOKZPn87ixYu55557GD16NK1atWLx4sV8/vnn3H777QwfPpzy8nImTpzIggUL6NatG+Xl5YwZM4bhw4czc+ZMBg8enLDPp556KlOmTAHgiy++YNy4cXzyyScA3H333QwcOJCbb76ZNWvW8Nlnn/Hpp59yzTXX8LOf/Wy/dtauXcvIkSPZvn07APfccw8DBgxg5MiRDB8+nKFDhwLw4x//mEsuuYQhQ4Yk3Y5btmzhsMMOA+Dss89mw4YN9O3blwsuuID77ruPvLw8XnzxRebPn590m7do0YIJEyYQi8X44x//yOmnn87o0aMpKyvjkENSS9/a8xfJEcuXL6dfv377lbVq1YquXbuyevXqWrXx3nvv8eijj/Lyyy+zZMkS8vLymDlzJgCTJ09m8eLFLFu2jAULFrBs2bLK9+Xn57Nw4UJGjBgBQLt27XjrrbcYP348d9xxR8J1lZaWsnDhQp555pnKI4Inn3yStWvX8s477zBt2jQWLVpUufzLL79M3759E7Y1d+5chg0bBsAvfvELrrrqKt544w1mzZrF5ZdfXrncsmXLmDNnDosWLeLXv/4169bt/xPi7du354UXXuCtt97i0Ucf5corrwTg8ssv56GHHgJg8+bNvPLKK5x33nkH9GPNmjX07duXo446ijvvvJOrr74aCIa8Ko4KbrrpJsaNG8dVV13F/Pnzq93m27dvp2fPnrz22mucdtppNGnShKOPPpqlS5cm3A51oT1/kTSobg89Xdw94RUf7l7rNv7xj3/w5ptvctJJJwGwc+dO2rdvD8Bjjz3G1KlTKSsro7S0lBUrVtC7d28ALrnkkv3aufDCCwHo168fTz75ZMJ1DRs2jCZNmtCzZ0/Wr18PwMKFC7n44otp0qQJHTt25Mwzz6xcvrS0lHbt2u3Xxplnnsn69etp37595bDPvHnzWLFiReUyW7ZsYevWrQAMHTqUgoICCgoKOPPMM3n99df3+0LZu3cvEydOrEzCq1atAuCMM85gwoQJbNiwgSeffJKLLroo4Z53RYIHePTRRxk7dixz585NtrmB6rd5Xl5e5dFGhfbt27Nu3boDvujrSslfJEf06tWLWbP2//XULVu2sH79eo499lj+9Kc/8cADDwDw7LPPJmzD3Rk1ahS/+93v9iv/6KOPuOOOO3jjjTc47LDDGD169H7XlDdv3ny/5Zs1awYEyausrCzhuiqWqVhv/L+JFBQUHHAd+/z582nevDmjR4/mxhtv5M4776S8vJxFixZRUFBwQBtVvxyrzt9111106NCBpUuXUl5eTn5+fmXdyJEjmTlzJo888ggPPvhg0n5WGDJkSOUQXHWSbXMIjqjy8vZ/IOWuXbsSxlZXGvYRyRFnnXUWO3bsYMaMGUBwgvCXv/wlEydOpKCggAkTJrBkyRKWLFlC586JH29+1lln8cQTT7BhwwYAvvrqKz7++GO2bNlC8+bNad26NevXr+e5555LSwynnXYas2bNory8nPXr1xOLxSrrjj/++MorfeIVFBRw9913M2PGDL766ivOPvts7rnnnsr6ij1xgNmzZ7Nr1y42btxILBar3NuusHnzZjp16kSTJk14+OGH9zuBPXr0aO6++24g+KKtycKFC2t1MjfZNk9m1apVtVp/TZT8RXKEmfHUU0/xxBNP0KNHD9q2bUuTJk2YNGlS0vdMnz6dwsLCyqlVq1bceuutnH322fTu3Zvvf//7lJaW0qdPH0488UR69erFmDFjGDhwYFpiuOiiiygsLOSEE07giiuu4JRTTqF169YADB48mJdeeinh+zp16sSll17Kn/70J6ZMmcLixYvp3bs3PXv25P77769c7uSTT2bw4MH079+fG2644YAvwZ///Of8+c9/pn///qxatWq/I5oOHTpw/PHHV7s3XzHm36dPH66//nqmTZtWY8w9e/ZMuM0TWb9+PQUFBXTq1KnGdmvk7gfF1K9fP6+PI3/1TOVUXd3F979Sr/az1fz58zPdhUaX6ZhXrFjR6OvcsmVL0rqXX37ZjzjiCF+8eHEj9ih1W7dudXf3L7/80rt37+6lpaXu7r5jxw4vKirysrKyerV70003+R/+8Id692v79u3evXt337RpU73bqI/4/+M777zTp02blnTZRJ9BYLEnyKka8xfJUQMGDKj3jU+ZdP7557Np0yb27NnDDTfcQMeOHYFgeOf666/ns88+44gjjmjUPs2bN48xY8Zw9dVXVx6JZEKbNm0YOXJkg7Sl5C8iWSV+nL+qQYMG1fupnjfffHP9OhSut+K+gUyqzQnk2tKYf0iP45KG4HW4rFKkIdX1s6fkL9JA8vPz2bhxo74ApNF5+Dz/+EtTa6JhH5EGUlhYSElJCV988UWjrXPXrl11+oM/2EUtXqh9zBW/5FVbSv4iDaRp06a1/hWlhhKLxTjxxBMbdZ2ZFLV4IX0xpzTsY2a/MbNlZrbEzJ43s85xddeZ2WozW2lm58SV9zOzd8K6KaZfvxARaXSpjvn/wd17u3tf4BngRgAz6wmMAHoB5wL3mlnFPcr3AWOBHuF0bop9EBGROkop+bt7/DNrmwMVZ7qGAo+4+253/whYDZxsZp2AVu6+KLz5YAYwLJU+NBSdohORKEl5zN/MJgOXAZuBikfwHQ68GrdYSVi2N3xdtTxZ22MJjhIAtpnZymTL1qAd8KXdlnyBjwEbV8/Ws1M74MtMd6KRKebcF7V4IfWYj0xUWGPyN7N5QMcEVZPcfba7TwImmdl1wETgJhJfNu/VlCfk7lOBqTX1sSZmttjdi1Jt52CimKMhajFHLV5IX8w1Jn93H1TLtv4KzCFI/iVAl7i6QmBdWF6YoFxERBpRqlf79IibHQK8H75+GhhhZs3MrBvBid3X3b0U2Gpm/cOrfC4DZqfSBxERqbtUx/x/b2bHAuUEw+bjANx9uZk9BqwAyoAJ7l7xYOzxwHSgAHgunNIt5aGjg5BijoaoxRy1eCFNMZtuRRcRiR4920dEJIKU/EVEIiink7+ZnRs+XmK1mV2b6f6kwsweNLMNZvZuXNk3zewFM/sg/PewuLqD/vEaZtbFzOab2XtmttzMfhGW52zcZpZvZq+b2dIw5lvC8pyNGcDM8szsbTN7JpzP9XjXhn1dYmaLw7LGjTnRz3vlwgTkAWuA7sChwFKgZ6b7lUI83wW+A7wbV3Y7cG34+lrgtvB1zzDeZkC3cDvkhXWvA6cS3HPxHPCDTMdWTcydgO+Er1sCq8LYcjbusH8twtdNgdeA/rkcc9jXqwkuF38mIp/ttUC7KmWNGnMu7/mfDKx29w/dfQ/wCMFjJw5K7v4i8FWV4qHAn8PXf+brR2UcdI/XSMTdS939rfD1VuA9gjvCczZuD2wLZ5uGk5PDMZtZITAYiP+185yNtxqNGnMuJ//DgU/j5qt9lMRBqoMH904Q/ts+LE8W++HU4fEa2cTMugInEuwJ53Tc4RDIEmAD8IK753rMdwPXEFwyXiGX44XgC/15M3szfIwNNHLMufw8/zo9SiLHNMjjNbKFmbUAZgH/x923VDOsmRNxe3BPTF8zawM8ZWYnVLP4QR2zmZ0PbHD3N82suDZvSVB20MQbZ6C7rzOz9sALZvZ+NcumJeZc3vNP9oiJXLI+PPQj/HdDWJ4zj9cws6YEiX+muz8ZFud83ADuvgmIETz2PFdjHggMMbO1BEOz3zOzv5C78QLg7uvCfzcATxEMUzdqzLmc/N8AephZNzM7lOD3BZ7OcJ8a2tPAqPD1KL5+VEZOPF4j7ON/Au+5+51xVTkbt5l9K9zjx8wKgEEEj03JyZjd/Tp3L3T3rgR/o/90938jR+MFMLPmZtay4jVwNvAujR1zps96p3MCziO4QmQNwVNIM96nFGL5L6CUrx+L/VOgLfAP4IPw32/GLT8pjHslcVcAAEXhB20NcA/hXd7ZOAGnERzGLgOWhNN5uRw30Bt4O4z5XeDGsDxnY47rbzFfX+2Ts/ESXIG4NJyWV+Smxo5Zj3cQEYmgXB72ERGRJJT8RUQiSMlfRCSClPxFRCJIyV9EJIKU/EVEIkjJX0Qkgv4HjSgwjeEB4ZcAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# plot per episode reward\n",
    "plot_rewards(\"Cliff World\", rewards, 'Q-Learning(Replay Bffer)')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " v  >  >  ^  >  >  >  >  >  >  ^  v \n",
      " ^  ^  >  >  >  v  >  >  >  v  >  v \n",
      " >  >  >  >  >  >  >  >  >  >  >  v \n",
      " X  C  C  C  C  C  C  C  C  C  C  T \n"
     ]
    }
   ],
   "source": [
    "# print policy learnt by the agent\n",
    "print_policy(env, agent)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q Learning for \"Taxi\" environment "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEICAYAAAC3Y/QeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAk/klEQVR4nO3deZxU1Zn/8c/TTUOzo4iIgIAGJwoCQosQ0fSoKC4jhCWBSQYzWVBGTMyYqIgTcWGik8VRM9HBZIL8QoIGRIkGRSUlLuAWjQooghLTgguggRbBBp7fH/d29+3uqt6qq6u77vf9etWrq865y3O64alT5557ytwdERGJl7xsByAiIs1PyV9EJIaU/EVEYkjJX0QkhpT8RURiSMlfRCSGlPyl1TGzFWZ2YbbjyDVmtsDMbsx2HNI8lPwlo8xsnZmVho8DZrY38vrqxhzT3c9x97vref6EmbmZDa1Wfn9YXtyYGKR+LHCjmb1rZn8P/x6Datl+mJm9aGZ7wp/DmjHcWFHyl4xy90Hu3sndOwFPArPKX7v7fzZTGBuB6eUvzKw7MAr4sJnOn5KZtcnSefOb6VRTgG8ApwKHAmuA/5ciprbAA8BvgEOAu4EHwnJpYkr+MWBmR5rZUjP70MzeNrPvROrmmtm9ZrbQzHaHPfWisO4qM1tS7Vi3mtltTRDTMWa2ysx2mNl2M1tkZt0idTvNbHgk/u3lvfSw9/itBpxuEfCVSMKbBiwDPovEkxe2d3MY071mdmik/vdm9l7Ye10d7b2GwyX/Y2YPhb/DZ83smBTt7h9+4vimmb0DrArLv2FmG8zsIzN7xMz6heXXmdnt4fMCM/vEzP4rfN0+/CR1SD1jvMPM/mhmnwD/aGYnmtmfw5jvAQob8DutrwHAU+7+lrsfIEjsx6fYthhoA/y3u+9z99sAA07PQFyxp+Sf48wsD/gD8BegN3AGcJmZnR3Z7AJgMdANWA78PCz/HXCumXUJj5UPfBn4bfj6F2b2cYrHK3WFBvwIOBI4DugLzAVw983AlcAiM+sA/BpY4O6JRv4atgLrgbPC19OBhdW2+Q4wAfhiGNNHwP9E6lcAA4HDgT8TvKFETQOuI+ixbgLm1RHTFwnafbaZTQCuBiYCPQg+If0u3O4JgqQIcBLwXrgvwGjgDXf/qJ4x/nMYV2fgOeB+gl74ocDvgUmpgjWzMbX8rT82szEpdl0MfM7MjjWzAuBC4OEU2w4CXvGqa868EpZLU3N3PXL4AZwMvFOtbDbw6/D5XOCxSN3xwKeR108B08PnY4HNacSSAL6Vom4C8FK1suXAqwQJoF19jpPqnMDXCBLqPwAbw7oSoDh8vgE4I7JfL6AMaJPkmN0AB7qGrxcAv4zUnwu8niKe/uG+R0fKVgDfjLzOA/YA/YD2wF6gO3AVwZtECdCJ4M3mthTnSRbjwkj9aQRvihYpewa4sYn//bUFbg1j2Q+8DQxIse1/AIurlS0C5jbH/5W4PdTzz339gCOjvTSCBNIzss17ked7gMLIWPRvCXq1EPQcf9sUQZnZ4Wa22IILgbsIhgMOq7bZXcBg4HZ335fmKe8jGD64lORjzv2AZZHf0QbgANDTzPLN7KZwSGgXsCXcJxpv9d9hpzri+Vu1c98aOfdOgk9Gvd39U+AFgt7+aQSfBJ4BTgnLnoDgU1k9Yoye80jgXQ8zbOivdcTcGNcSfGLpSzCsdB2wKvxEV10p0KVaWRdgdwbiij0l/9z3N+Btd+8WeXR293Pruf/vgWIz6wN8iUjyN7M7rXLmTvXHujqO+yOC3uAQd+9C0DO3yLE7Af8N/AqYGx1/bwx330PQw55J8uT/N+Ccar+nQnd/l+BNbzxwJtCVoPdONN7GhFTt3BdVO3d7d38mrH+C4I3rROD58PXZwEhgdbhNfWKMnnMb0NvMovVHpQrWzE6t5W9damanpth1KHCPu5e4+353X0AwNJZs3H8dMKRaTEPCcmliSv657zlgl5ldGV4gzDezwWZ2Un12dvcPCYZOfk3wJrIhUnexV87cqf6oa5y2M0FP72Mz6w38oFr9rcCL7v4t4CHgzmQHiVxA7V+P5lwNfNHdtySpuxOYF7nQ2sPMxkdi3QfsADoATT1L6U5gdvkFWjPramZTIvVPEFynWO/un1E5lPV2+PdpTIxrCIZhvmNmbcxsIsGbSVLu/mQtf+tO7v5kil2fB6aYWU8LLqr/C1BAcF2kugTBp63vmFk7M5sVlq+qoy3SCEr+Oc6DGRb/BAwjGG/dDvySoHdYX78l6FE2yZBP6DpgOPB3guR+X3lFmHTHAReHRf8ODDezryY5Tl+C4Yp36zqhu29196dSVN9KcI1hpZntBtYSXC+B4OJw+TnWh3VNxt2XATcDi8Mhm9eAcyKbPEMw9l/ey19PcB1gdWSbBsUYvolMBL5OcHH7K0T+Bk3oZoLJBi8DHwPfAya5+8dQccPe1ZGYJhC80X1MMEV0QlguTcyqDvmJtC5mdg3wobv/b7ZjEWlNlPxFRGIoa8M+ZjbOzN4ws01mdlW24hARiaOs9PzDm4U2EswbLyG4KDTN3dc3ezAiIjGUrZ7/SGCTB7d8f0ZwF+D4OvYREZEmkpVFpQiWGYjecFJC5cyKCmY2A5gB0L59+xF9+/Zt1MkOHjxIXl7l+9yWXQcbdZxsyjc4UO1DmhFM3O7XJY+yA/BJmfP3z6pu1L9LZbu3f+qUljmHtTc6FSSfol79dxPdH2Dvfihs4L+afQegIA/yUsyKb8wxk6n+d25O5b+36r+vTMtmm7Mhbu2F9Nu8cePG7e7eo3p5tpJ/sjRQY/zJ3ecD8wGKior8hRdeaNTJEokExcXFFa/7X/VQo47TXM4f0osHX9kGQOL7xRx1aAe+vfAFHn/9gyrbFeQbZQecV28cR7s2+fzvE5v50YrXq2zzxk3nVTz/93tf5r4/v8tPpgxl8og+Sc9d/XcT3b+lq/53bk7lv7fm/n1ls83ZELf2QvptNrOkd25n6y20hGB+drk+BOuMCFB+Geb2aSfS/7CO5OUZluTtsvgfDgegTQN7BZrhJSLZ6vk/Dww0swEEN6VMJbg9XYD9B4MhhDaRcZLoHe+Xnv45po48iu4d2/Lh7n3kh9ud/vnDa/T8oyyt1QhEJJdkpefv7vuBWcAjBAto3evuWr8jdDDsmOdFkn90vPziLx5D727tKSzIp++hletjDezZmS31GHZQv19EstXzx93/CPwxW+dvbfIiPf+O7Rr3Zzvt2MNY+ucSjjui+sKJVXVtX8DfPy1r1Dmk9SorK6OkpIS9e/dmO5SUunbtyoYNG+reMIfUt82FhYX06dOHgoKCeh03a8lfqnr9hnF8/j+C77hINiSfl2zQv4HGD+tN8bGH07VD6n8cG64fhxlcfd+rdGqKKTjSapSUlNC5c2f69+9fZZixJdm9ezedO3fOdhjNqj5tdnd27NhBSUkJAwYMqNdx9b+7hSgsyOenU4ZybM/O3Pr4m0DVKVFN9X+xtsQP0L5t8E2HP/vKsKY5obQae/fubdGJX1IzM7p3786HH9b/a6mV/FuQSSmmX0LT9Pwlc5684h8pyG/988+V+Fuvhv7tlPxbpJrjPo35P1mfi7/SNKIX3kVag9bfVYkJ9fwlLkpKShg/fjwDBw7k6KOPZtasWezbV/ktnt/97nfp3bs3Bw9m5k79+++/n/Xrqy4zdtlll7F69eoUezTM3Llz+clPftKofb///e+zalXTfLeNkn8zG/O5w+jdrX2D91PulzhwdyZOnMiECRN48803efPNN/n000+54oorgGCpg2XLltG3b98mS8bVVU/+O3fuZO3atZx22mk1tj1w4EBGYkjl0ksv5aabbmqSYyn5N7NDOrat97bRMbyG9Pw7hhdtRVqbVatWUVhYyL/+678CkJ+fzy233MLChQspLS1l9erVDB48mJkzZ/K73/2uYr8PP/yQsWPHMnz4cC666CL69evH9u3bAfjNb37DyJEjGTZsGBdddFFFwu7UqRNz5sxh6NChjBo1ivfff59nnnmG5cuX84Mf/IBhw4axefNmlixZwrhx4yrO1b9/f66//nrGjBnD73//e1auXMno0aMZPnw4U6ZMobS0tGK7K6+8kpEjRzJy5Eg2bar5zZV33XUXJ510EkOHDmXSpEns2bOH3bt3M2DAAMrKgunWu3bton///pSVldGvXz927NjBe++9l/bvWmP+WVC+vMLSmaOZdMeaJPU190m1KFoyq75fzIOrnm5seCJc94d1rN+6q0mPefyRXbj2n2r/aud169YxYsSIKmVdunShf//+bNq0iSVLljBt2jTGjx/P1VdfTVlZGQUFBVx33XWcfvrpzJ49m4cffpj58+cDsGHDBu655x6efvppCgoK+Ld/+zcWLVrE9OnT+eSTTxg1ahTz5s3jiiuu4K677uKaa67hggsu4Pzzz2fy5MkAXH/99RXPyxUWFvLUU0+xfft2Jk6cyGOPPUbHjh25+eab+dnPfsYPf/jDitife+45Fi5cyGWXXcaDDz5Y5TgTJ07k29/+NgDXXHMNv/rVr7j00kspLi7moYceYsKECSxdupRJkyZVzN8fPnw4Tz/9NJMmTWrkXyKgnn8WHdG19uGfaL5vSM+/Z5dCjumm3r+0Pu6edNaKu/PZZ5+xcuVKJkyYQJcuXTj55JNZuXIlAE899RRTp04FYNy4cRxyyCEAPP7447z44oucdNJJDBs2jMcff5y33noLgLZt23L++ecDMGLECLZs2ZI0pm3bttGjR9VFMb/yla8AsHbtWtavX88pp5zCsGHDuPvuu/nrXyvXUZs2bVrFzzVranb0XnvtNU499VROOOEEFi1axLp1wUIH3/rWt/j1r38NBJ9cyj8JARx++OFs3Zr+Umix6vmX7tvPi3/9qFnP+Z3TP8dtqyo/7pUvw9xQx/Wq/a5ckaZUVw89UwYNGsTSpUurlO3atYv333+fbdu2sWvXLk444QQA9uzZQ4cOHTjvvPNSLlbo7lx44YX86Ec/qlFXUFBQ8UaTn5/P/v37kx6jffv2Ne567tixY8Xxx44dW2UIKir6RpbsTe3rX/86999/P0OHDmXBggUkEgkATjnlFLZs2cITTzzBgQMHGDx4cMU+e/fupX37hl83rC5WPf/LFr/Mhf/3XLOec0pR6u8gSNWXT/bPeProfk0Sj0hLdsYZZ7Bnzx4WLlwIBBdUL7/8cmbNmsXixYu5/fbb2bJlC1u2bOHtt99m5cqV7NmzhzFjxnDvvfcCsHLlSj766KOK4y1ZsoQPPgiWQ9+5c2eVnnkynTt3Zvfu3RWvjzvuuKTj9QCjRo3i6aefrqjfs2cPGzdurKi/5557Kn6OHj26xv67d++mV69elJWVsWjRoip106dPZ9q0aXzta1+rUr5x48YqbwaNFavk/9b20mY/Z7LRmsasqGxm3DBhMDdMSP+PLtJSmRnLli1jyZIlDBw4kO7du5OXl8f3vvc9HnnkEc4+++yKbTt27MiYMWP4wx/+wLXXXsvKlSsZPnw4K1asoFevXnTu3Jnjjz+eG2+8kbPOOoshQ4YwduxYtm3bVmsMU6dO5cc//jEnnngimzdv5rzzzqvokVfXo0cPFixYwLRp0xgyZAijRo3i9dcrV9bdt28fJ598Mrfeeiu33HJLjf1vuOEGTj75ZMaOHcvnP//5KnVf/epX+eijj6pcbygrK2PTpk0UFRXV59dZq1gN+7SE2ZLRN4OGTt/8l1Hq/Uvu69u3L8uXLwfgmWeeYdq0acyYMYOdO3dW6ZED3HfffUCQZB955BHatGnDmjVr+NOf/kS7du2AYHy+fIw+qnxWDsDkyZMrkuwpp5xSZarnMcccw+zZs/n444/p1q1bjWsDp59+Os8//3zStlxyySVce+21Vcrmzp1b8XzmzJnMnDkz6b5PPfUUkydPplu3bhVlDz74IJMnT6ZNm/RTd7ySfwuZLO/hwE5d6+u3kHBFsuYLX/hCncM0AO+88w5f/vKXOXjwIG3btuWuu+5q0jh++tOf8s4771RJxJl06aWXsmLFCv74x6oLH+/fv5/LL7+8Sc4Rq+TfkOmSmVQ+7JMqueubtkQaZuDAgbz00ksZO/7JJ9f4ivE6pZo9VB+33357xfPop50pU6Y0+pjVxSr5t4Rvsqo+22fV5V+s+CauGttmP1yJmVRTLaXla2inMVYXfLPxb9rMmPelqhdpu4d3+ebnGUf36ES/7h2r1HfrENS3a6O5+tJ8CgsL2bFjhz55tkLl6/kXFhbWe59Y9fyz5ZzBvZiz7LWK13d/YyR/ev0DDuvULun2140fxODeXfnCMd2bK0QR+vTpQ0lJSYPWhG9ue/fubVCCywX1bXP5N3nVV6ySf7ZWxuxU7WsXe3YpZOrIo1Ju36WwgG+Oqd+38Yg0lYKCgnp/C1S2JBIJTjzxxGyH0awy1WYN+2T6nEDbNnn8dMrQMAaNp4pI9in5i4jEUKySv74QRUQkEKvkn83UXz5/Qm8/ItISxCr5a9xHRCQQq+Sv1C8iEohV8m8Ryzu0hBhEJPZilfw1zVJEJJCx5G9mc83sXTN7OXycG6mbbWabzOwNMzu7tuM0aUzNdaLoOcOT6pZ5EWlJMn2H7y3u/pNogZkdD0wFBgFHAo+Z2bHufiDDsWTseu8xPTqy+cNP6heDxn1EpAXIxrDPeGCxu+9z97eBTcDILMTRJDZcP44V3z2tzu3U7xeRliTTyX+Wmb1iZv9nZoeEZb2Bv0W2KQnLMi4TY/7t2+bTtk3qX2P1nr4uO4hIS5DWsI+ZPQYckaRqDnAHcANBp/cG4KfAN0g+9J60Y2xmM4AZAD179kz5PZp1KS0tJZFI8PePP23U/rWpK6Y1a57hkMI89pceBKAvHza6HQ1R3uY4UZtzX9zaC5lrc1rJ393PrM92ZnYX8GD4sgToG6nuA2xNcfz5wHyAoqIiLy4ublSciUSC4uJi7ty4Bj7a2ahjpFIR08MPJa0fPfoLHNE1WI71n89v0lPXqrzNcaI25764tRcy1+ZMzvbpFXn5JaB8QfvlwFQza2dmA4CBwHOZiqNKTLrYKiICZHa2z3+Z2TCCIZ0twEUA7r7OzO4F1gP7gUuaY6YPQF6s7moQEUktY8nf3f+llrp5wLxMnTsV9fxFRALqC2eYZveISEuk5C8iEkNK/iIiMaTkLyISQ0r+IiIxFKvkn42Lr7reKyItUaySfyb95dqzapR1aJtPtw5tsxCNiEjtMr2kc2x0bV9Qo2z99eOyEImISN3U80/DH2aNyXYIIiKNouSfhhP6dM12CCIijaLkLyISQ0r+IiIxFJvk/6fXP+DJN7dnOwwRkRYhNsn/nuf/VvdGIiIxEZvkLyIilZT8RURiSMlfRCSGYpP8Hc92CCIiLUZskr+IiFRS8hcRiSElfxGRGFLyFxGJISV/EZEYik3yd032ERGpEJvkLyIilZT8RURiSMlfRCSGlPxFRGIoreRvZlPMbJ2ZHTSzomp1s81sk5m9YWZnR8pHmNmrYd1tZmbpxFBfut4rIlIp3Z7/a8BEYHW00MyOB6YCg4BxwC/MLD+svgOYAQwMH+PSjEFERBooreTv7hvc/Y0kVeOBxe6+z93fBjYBI82sF9DF3de4uwMLgQnpxCAiIg3XJkPH7Q2sjbwuCcvKwufVy5MysxkEnxLo2bMniUSiUcGUlpayffveRu1bm7riaWy8TaG0tDSr588GtTn3xa29kLk215n8zewx4IgkVXPc/YFUuyUp81rKk3L3+cB8gKKiIi8uLq492BQSiQSHHdYJPni/UfunUiOehx+qvb4ZJRKJrJ4/G9Tm3Be39kLm2lxn8nf3Mxtx3BKgb+R1H2BrWN4nSXnG6Q5fEZFKmZrquRyYambtzGwAwYXd59x9G7DbzEaFs3ymA6k+PYiISIakO9XzS2ZWAowGHjKzRwDcfR1wL7AeeBi4xN0PhLvNBH5JcBF4M7AinRhERKTh0rrg6+7LgGUp6uYB85KUvwAMTue8IiKSHt3h28TOGZzs2riISMuSqameLVDmr/i+9Z/nYgYDZv8x4+cSEUlHjJJ/5uXlNctKFSIiaYvRsI8Ss4hIuRglfxERKafkLyISQ0r+IiIxFKPkr/UdRETKxSj5i4hIOSV/EZEYUvIXEYkh3eSVAS/9x1jd8CUiLVpskn9zrud/SMe2zXcyEZFG0LCPiEgMKfmLiMSQkn89XD722GyHICLSpJT86+GcE3plOwQRkSYVm+Sv+3tFRCrFJvmLiEglJX8RkRhS8q8H0/1aIpJjlPxFRGIoNsnfm/MWXxGRFi42yV9ERCop+YuIxJCSfz3oeq+I5BolfxGRGEor+ZvZFDNbZ2YHzawoUt7fzD41s5fDx52RuhFm9qqZbTKz28w0kVJEpLml2/N/DZgIrE5St9ndh4WPiyPldwAzgIHhY1yaMdRLunN9BhzWsUniEBFpCdJK/u6+wd3fqO/2ZtYL6OLuazyYe7kQmJBODM3BzBjcu2u2wxARaTKZ/CavAWb2ErALuMbdnwR6AyWRbUrCsqTMbAbBpwR69uxJIpFoVCClpaXs3LG3UfsCPPvss3zw/mc1yhsbT3MoLS1t0fFlgtqc++LWXshcm+tM/mb2GHBEkqo57v5Ait22AUe5+w4zGwHcb2aDSD5xJuWIjLvPB+YDFBUVeXFxcV3hJpVIJOjevQNs/7BR+48cOZKnd70J722tUt7YeJpDIpFo0fFlgtqc++LWXshcm+tM/u5+ZkMP6u77gH3h8xfNbDNwLEFPv09k0z7A1ppHaFl0TVpEck1Ghn3MrAew090PmNnRBBd233L3nWa228xGAc8C04HbMxFDdU25uMP3zzqW0cd0b8Ijiog0r7SSv5l9iSB59wAeMrOX3f1s4DTgejPbDxwALnb3neFuM4EFQHtgRfhoVWadPjDbIYiIpCWt5O/uy4BlScqXAktT7PMCMDid82ZDh4L8bIcgItJkdIdvPRgw5/zjsh2GiEiTUfKvpy6FBdkOQUSkycQm+Ws5fxGRSrFJ/iIiUknJX0QkhpT860H3eIlIrlHyr4duHdpmOwQRkSaVyYXdWpTGXO/NzzNem3s27dtqjr+I5Bb1/GthoMQvIjlJyV9EJIaU/GuhC70ikquU/EVEYkjJX0QkhmKT/F3rO4iIVIhN8hcRkUpK/iIiMaTkLyISQ7FJ/k++uT3bIYiItBixSf4iIlJJyV9EJIaU/EVEYkjJX0QkhpT8RURiSMlfRCSGlPxFRGJIyV9EJIZyPvl/+tkB3t19MNthiIi0KGklfzP7sZm9bmavmNkyM+sWqZttZpvM7A0zOztSPsLMXg3rbjPL7FemfGfxS8x5+tNG7Wvo21xEJDel2/N/FBjs7kOAjcBsADM7HpgKDALGAb8ws/Ivw70DmAEMDB/j0oyhVs++tSOThxcRaZXSSv7uvtLd94cv1wJ9wufjgcXuvs/d3wY2ASPNrBfQxd3XeLDA/kJgQjoxZJKj7wAQkdzUpgmP9Q3gnvB5b4I3g3IlYVlZ+Lx6eVJmNoPgUwI9e/YkkUg0OKj9+/fXvVEKBw960nM2Jo7mVlpa2iribEpqc+6LW3shc22uM/mb2WPAEUmq5rj7A+E2c4D9wKLy3ZJs77WUJ+Xu84H5AEVFRV5cXFxXuDW0STwCjXwDyMszqpzz4YcAaEwczS2RSLSKOJuS2pz74tZeyFyb60z+7n5mbfVmdiFwPnCGV35XYgnQN7JZH2BrWN4nSXmLcsdXhzNz0Z+zHYaISMakO9tnHHAlcIG774lULQemmlk7MxtAcGH3OXffBuw2s1HhLJ/pwAPpxJAJZxzXM9shiIhkVLpj/j8H2gGPhjM217r7xe6+zszuBdYTDAdd4u4Hwn1mAguA9sCK8NGi6EKviOS6tJK/u3+ulrp5wLwk5S8Ag9M5b3PRPH8RyVVNOdunRWroPWSHdChIWTd5RB+O7tEx3ZBERLIu55N/Qz1wyZiUdT+ZMrQZIxERyZycX9unoY7q3iHbIYiIZJySv4hIDCn510KzfkQkVyn5J6FZPiKS65T8k1CPX0RynZJ/LfQJQERylZK/iEgM5Xzyz+z3hImItE45n/xdw/ciIjXkfPIXEZGalPxFRGJIyV9EJIaU/EVEYkjJP4k2ecGv5ZunDshyJCIimaElnZPIzzO23HRetsMQEcmYnO/5a56/iEhNOZ/8RUSkJiV/EZEYUvIXEYkhJX8RkRhS8hcRiSElfxGRGFLyFxGJISV/EZEYyvnkr3u8RERqSiv5m9mPzex1M3vFzJaZWbewvL+ZfWpmL4ePOyP7jDCzV81sk5ndZqZ7cEVEmlu6Pf9HgcHuPgTYCMyO1G1292Hh4+JI+R3ADGBg+BiXZgwiItJAaSV/d1/p7vvDl2uBPrVtb2a9gC7uvsbdHVgITEgnhjpjzOTBRURaqaYc8/8GsCLyeoCZvWRmT5jZqWFZb6Aksk1JWCYiIs2oziWdzewx4IgkVXPc/YFwmznAfmBRWLcNOMrdd5jZCOB+MxtE8uuvKTvnZjaDYIiInj17kkgk6gq3hrKysgZt35hztESlpaU505b6UptzX9zaC5lrc53J393PrK3ezC4EzgfOCIdycPd9wL7w+Ytmthk4lqCnHx0a6gNsreXc84H5AEVFRV5cXFxXuDUUrF4JDXgDaMw5WqJEIpEzbakvtTn3xa29kLk2pzvbZxxwJXCBu++JlPcws/zw+dEEF3bfcvdtwG4zGxXO8pkOPJBODE2pf/cO2Q5BRKRZpPtNXj8H2gGPhjM214Yze04Drjez/cAB4GJ33xnuMxNYALQnuEawovpBm1JD5pF+b+yxGYtDRKQlSSv5u/vnUpQvBZamqHsBGJzOeTMlT7cciEhM5PwdviIiUpOSv4hIDCn5i4jEkJJ/hIb8RSQulPxFRGJIyV9EJIZyPvlrxWgRkZpyPvmLiEhNSv4iIjGk5C8iEkNK/iIiMaTkH+H62i8RiYmcT/6ujC4iUkPOJ38REalJyV9EJIZyPvnrJi8RkZpyPvmLiEhNSv4R+pAgInGh5C8iEkNK/iIiMaTkLyISQ0r+IiIxpOQvIhJDOZ/8NYFHRKSmnE/+IiJSU6yTvxmcedzh2Q5DRKTZxTv5ZzsAEZEsiXXyFxGJq7SSv5ndYGavmNnLZrbSzI6M1M02s01m9oaZnR0pH2Fmr4Z1t5lWXhMRaXbp9vx/7O5D3H0Y8CDwQwAzOx6YCgwCxgG/MLP8cJ87gBnAwPAxLs0YGk3vOyISV2klf3ffFXnZESj/2qzxwGJ33+fubwObgJFm1gvo4u5rPPiKrYXAhHRiSEcQgkVeZysSEZHm1SbdA5jZPGA68HfgH8Pi3sDayGYlYVlZ+Lx6eapjzyD4lABQamZvNDLMw4DtySp+FXl+wc2NPHrLlLLNOUxtzn1xay+k3+Z+yQrrTP5m9hhwRJKqOe7+gLvPAeaY2WxgFnAtySfSeC3lSbn7fGB+XTHWxcxecPeidI/TmqjN8RC3NsetvZC5NteZ/N39zHoe67fAQwTJvwToG6nrA2wNy/skKRcRkWaU7myfgZGXFwCvh8+XA1PNrJ2ZDSC4sPucu28DdpvZqHCWz3TggXRiEBGRhkt3zP8mM/sH4CDwV+BiAHdfZ2b3AuuB/cAl7n4g3GcmsABoD6wIH5mW9tBRK6Q2x0Pc2hy39kKG2myuKS4iIrGjO3xFRGJIyV9EJIZyOvmb2bhweYlNZnZVtuNJh5n9n5l9YGavRcoONbNHzezN8OchkbpWv7yGmfU1sz+Z2QYzW2dm3w3Lc7bdZlZoZs+Z2V/CNl8XludsmwHMLN/MXjKzB8PXud7eLWGsL5vZC2FZ87bZ3XPyAeQDm4GjgbbAX4Djsx1XGu05DRgOvBYp+y/gqvD5VcDN4fPjw/a2AwaEv4f8sO45YDTBPRcrgHOy3bZa2twLGB4+7wxsDNuWs+0O4+sUPi8AngVG5XKbw1j/nWC6+IMx+be9BTisWlmztjmXe/4jgU3u/pa7fwYsJlh2olVy99XAzmrF44G7w+d3U7lURqtYXqMu7r7N3f8cPt8NbCC4Izxn2+2B0vBlQfhwcrjNZtYHOA/4ZaQ4Z9tbi2Ztcy4n/97A3yKva11KopXq6cG9E4Q/y7+ZJlXbe9OA5TVaEjPrD5xI0BPO6XaHQyAvAx8Aj7p7rrf5v4ErCKaMl8vl9kLwhr7SzF4Ml7GBZm5z2mv7tGANWkoixzTJ8hothZl1ApYCl7n7rlqGNXOi3R7cEzPMzLoBy8xscC2bt+o2m9n5wAfu/qKZFddnlyRlraa9Eae4+1YzOxx41Mxer2XbjLQ5l3v+qZaYyCXvhx/9CH9+EJbnzPIaZlZAkPgXuft9YXHOtxvA3T8GEgTLnudqm08BLjCzLQRDs6eb2W/I3fYC4O5bw58fAMsIhqmbtc25nPyfBwaa2QAza0vw/QLLsxxTU1sOXBg+v5DKpTJyYnmNMMZfARvc/WeRqpxtt5n1CHv8mFl74EyCZVNyss3uPtvd+7h7f4L/o6vc/WvkaHsBzKyjmXUufw6cBbxGc7c521e9M/kAziWYIbKZYBXSrMeURlt+B2yjclnsbwLdgceBN8Ofh0a2nxO2+w0iMwCAovAf2mbg54R3ebfEBzCG4GPsK8DL4ePcXG43MAR4KWzza8APw/KcbXMk3mIqZ/vkbHsJZiD+JXysK89Nzd1mLe8gIhJDuTzsIyIiKSj5i4jEkJK/iEgMKfmLiMSQkr+ISAwp+YuIxJCSv4hIDP1/xIynlKrjN3gAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# create taxi environment\n",
    "env = gym.make(\"Taxi-v3\")\n",
    "\n",
    "# create a Q Learning agent\n",
    "agent = QLearningAgent(alpha=0.25, epsilon=0.2, gamma=0.99,\n",
    "                       get_possible_actions=lambda s : range(env.nA))\n",
    "\n",
    "#train agent with replay buffer and get rewards for episodes\n",
    "rewards = train_agent(env, agent, episode_cnt=5000, \n",
    "                      replay_buffer = ReplayBuffer(512))\n",
    "\n",
    "#plot reward graph\n",
    "plot_rewards(\"Taxi\", rewards, 'QAgent(replay)')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot rewards\n",
    "def plot_rewards_compare(env_name, rewards, labels):\n",
    "    for i in range(len(rewards)):\n",
    "        reward_list = rewards[i]\n",
    "        label = labels[i] + '. mean={:.1f}'.format(np.mean(reward_list[-20]))\n",
    "        plt.title(\"env={}\".format(env_name))\n",
    "        plt.plot(rewards[i], label=label)\n",
    "    plt.grid()\n",
    "    plt.legend()\n",
    "    plt.ylim(-300, 0)\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion\n",
    "\n",
    "Q agent with replay buffer is supposed to improve the initial convergence by sampling repeatedly from the buffer. Sample efficiency will become more apparent when we look at DQN. Over long run, there won't be any significant difference between the optimal values learnt with or without Replay Buffer. It has another advantage of breaking correlation between samples. This aspect will also become apparent when we look at Deep Learning with Q-Learning i.e. DQN. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
